-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ascii::Char
(ACP#179)
#111009
Add ascii::Char
(ACP#179)
#111009
Conversation
r? @cuviper (rustbot has picked a reviewer for you, use r? to override) |
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
This comment has been minimized.
This comment has been minimized.
Should |
@pitaj Eventually that seems perfectly reasonable. But that needs an error type, so I left them out of the first PR. (We can always add more later, especially while it's unstable.) |
library/core/src/ascii/ascii_char.rs
Outdated
#[inline] | ||
pub const fn as_char(self) -> char { | ||
// SAFETY: Everything in the ASCII block is a valid USV. | ||
unsafe { char::from_u32_unchecked(self as u32) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unsafe block can be avoided by doing char::from(self.as_u8())
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I should think more about my copy-pastes ;)
Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, wait, const
. No char::from
in a const fn
. Changed this to a FIXME
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as
casting works however:
unsafe { char::from_u32_unchecked(self as u32) } | |
self.as_u8() as char |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right, yet another problem with as
: I forget it works and it's not something with a signature in rustdoc to remind me 😬
#[must_use] | ||
#[unstable(feature = "ascii_char", issue = "110998")] | ||
#[inline] | ||
pub const fn as_ascii(&self) -> Option<ascii::Char> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this take &self
and not self
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that self
would make more sense, but I think that for the existing is_ascii
above too.
So I made this &self
for consistency with the existing methods here. I'll add a note to the tracking issue to make sure that it's discussed further before stabilization.
#[must_use] | ||
#[unstable(feature = "ascii_char", issue = "110998")] | ||
#[inline] | ||
pub const fn as_ascii(&self) -> Option<ascii::Char> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
This comment has been minimized.
This comment has been minimized.
library/core/src/ascii/ascii_char.rs
Outdated
use crate::mem::transmute; | ||
|
||
/// One of the 128 Unicode characters from U+0000 through U+007F, | ||
/// often known as the [ASCII]subset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// often known as the [ASCII]subset. | |
/// often known as the [ASCII] subset. |
library/core/src/ascii/ascii_char.rs
Outdated
#[inline] | ||
pub const fn as_char(self) -> char { | ||
// SAFETY: Everything in the ASCII block is a valid USV. | ||
unsafe { char::from_u32_unchecked(self as u32) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as
casting works however:
unsafe { char::from_u32_unchecked(self as u32) } | |
self.as_u8() as char |
/// or returns `None` if it's too large. | ||
#[unstable(feature = "ascii_char", issue = "110998")] | ||
#[inline] | ||
pub const fn from_u8(b: u8) -> Option<Self> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bikeshedding incoming: Why not call the methods from_byte
/as_byte
? There is no stable precedent in the standard library for including the numeric type name in methods and from_byte
is IMHO much simpler to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is, actually: https://doc.rust-lang.org/std/primitive.char.html#method.from_u32
So since it's char::from_u32
, I figured that ascii::Char::from_u8
was the most consistent here.
But drop a note in the tracking issue if you feel strongly. I think that's a better place for bikeshedding discussions, so they'll be seen at (eventual) stabilization time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally missed that one! Seems very reasonable to rhyme with it. Bikeshedding complete 😉.
library/core/src/ascii/ascii_char.rs
Outdated
//! because it avoids a whole bunch of "are you sure you didn't mean `char`?" | ||
//! suggestions from rustc if you get anything slightly wrong in here, and overall | ||
//! helps with clarity as we're also referring to `char` intentionally in here. | ||
//! Maybe that means that the stuttering name would be better for the export too? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm conflicted on this personally too. I agree with the decision at least in terms of this module internally, but I would personally very much like to just use use std::ascii;
and ascii::Char
instead of having to write out use std::ascii::AsciiChar;
. It's an interesting stylistic point.
/// U+007F | ||
#[unstable(feature = "ascii_char_variants", issue = "110998")] | ||
Delete = 127, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with landing it like this, but I find that an enum like this is super weird. I'll post more of a comment on the tracking issue because I don't think I fully understand the trade offs here yet.
/// [block]: https://www.unicode.org/glossary/index.html#block | ||
/// [chart]: https://www.unicode.org/charts/PDF/U0000.pdf | ||
/// [NIST FIPS 1-2]: https://nvlpubs.nist.gov/nistpubs/Legacy/FIPS/fipspub1-2-1977.pdf | ||
/// [NamesList]: https://www.unicode.org/Public/15.0.0/ucd/NamesList.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't have to do it now, but I would like to see a "Why would I use this?" section before stabilization. :-)
#[inline] | ||
pub const unsafe fn as_ascii_unchecked(&self) -> &[ascii::Char; N] { | ||
let byte_ptr: *const [u8; N] = self; | ||
let ascii_ptr = byte_ptr as *const [ascii::Char; N]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we can use cast()
, right? We can also use the newly added ptr::from_ref()
.
I think this is ready for review for nightly now: There's lots more to do here before it'd be ready for stabilization, but this is already 700+ lines, so I'd like to move those things (like putting examples on everything) into later PRs. |
Ship it. @bors r+ |
…iaskrgr Rollup of 8 pull requests Successful merges: - rust-lang#108865 (Add a `sysroot` crate to represent the standard library crates) - rust-lang#110651 (libtest: include test output in junit xml reports) - rust-lang#110826 (Make PlaceMention a non-mutating use.) - rust-lang#110982 (Do not recurse into const generic args when resolving self lifetime elision.) - rust-lang#111009 (Add `ascii::Char` (ACP#179)) - rust-lang#111100 (check array type of repeat exprs is wf) - rust-lang#111186 (Add `is_positive` method for signed non-zero integers.) - rust-lang#111201 (bootstrap: add .gitmodules to the sources) Failed merges: - rust-lang#110954 (Reject borrows of projections in ConstProp.) r? `@ghost` `@rustbot` modify labels: rollup
ACP second: rust-lang/libs-team#179 (comment)
New tracking issue: #110998
For now this is an
enum
as @kupiakos suggested, with the variants under a different feature flag.There's lots more things that could be added here, and place for further doc updates, but this seems like a plausible starting point PR.
I've gone through and put an
as_ascii
next to everyis_ascii
: onu8
,char
,[u8]
, andstr
.As a demonstration, made a commit updating some formatting code to use this: scottmcm@ascii-char-in-fmt (I don't want to include that in this PR, though, because that brings in perf questions that don't exist if this is just adding new unstable APIs.)